40 research outputs found
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Speech Activity Detection (SAD) plays an important role in mobile communications and automatic speech recognition (ASR). Developing efficient SAD systems for real-world applications is a challenging task due to the presence of noise. We propose a new approach to SAD where we treat it as a two-dimensional multilabel image classification problem. To classify the audio segments, we compute their Short-time Fourier Transform spectrograms and classify them with a Convolutional Recurrent Neural Network (CRNN), traditionally used in image recognition. Our CRNN uses a sigmoid activation function, max-pooling in the frequency domain, and a convolutional operation as a moving average filter to remove misclassified spikes. On the development set of Task 1 of the 2019 Fearless Steps Challenge, our system achieved a decision cost function (DCF) of 2.89%, a 66.4% improvement over the baseline. Moreover, it achieved a DCF score of 3.318% on the evaluation dataset of the challenge, ranking first among all submissions
Image-based Text Classification using 2D Convolutional Neural Networks
We propose a new approach to text classification
in which we consider the input text as an image and apply
2D Convolutional Neural Networks to learn the local and
global semantics of the sentences from the variations of the
visual patterns of words. Our approach demonstrates that
it is possible to get semantically meaningful features from
images with text without using optical character recognition
and sequential processing pipelines, techniques that traditional
natural language processing algorithms require. To validate
our approach, we present results for two applications: text
classification and dialog modeling. Using a 2D Convolutional
Neural Network, we were able to outperform the state-ofart
accuracy results for a Chinese text classification task and
achieved promising results for seven English text classification
tasks. Furthermore, our approach outperformed the memory
networks without match types when using out of vocabulary
entities from Task 4 of the bAbI dialog dataset
Comparing CNN and Human Crafted Features for Human Activity Recognition
Deep learning techniques such as Convolutional
Neural Networks (CNNs) have shown good results in activity
recognition. One of the advantages of using these methods resides
in their ability to generate features automatically. This ability
greatly simplifies the task of feature extraction that usually
requires domain specific knowledge, especially when using big
data where data driven approaches can lead to anti-patterns.
Despite the advantage of this approach, very little work has
been undertaken on analyzing the quality of extracted features,
and more specifically on how model architecture and parameters
affect the ability of those features to separate activity classes
in the final feature space. This work focuses on identifying the
optimal parameters for recognition of simple activities applying
this approach on both signals from inertial and audio sensors.
The paper provides the following contributions: (i) a comparison
of automatically extracted CNN features with gold standard
Human Crafted Features (HCF) is given, (ii) a comprehensive
analysis on how architecture and model parameters affect separation
of target classes in the feature space. Results are evaluated
using publicly available datasets. In particular, we achieved a
93.38% F-Score on the UCI-HAR dataset, using 1D CNNs with
3 convolutional layers and 32 kernel size, and a 90.5% F-Score
on the DCASE 2017 development dataset, simplified for three
classes (indoor, outdoor and vehicle), using 2D CNNs with 2
convolutional layers and a 2x2 kernel size
Audio Content Analysis for Unobtrusive Event Detection in Smart Homes
Institute of Engineering Sciences
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Environmental sound signals are multi-source, heterogeneous, and varying in time. Many systems have been proposed to process such signals for event detection in ambient assisted living applications. Typically, these systems use feature extraction, selection, and classification. However, despite major advances, several important questions remain unanswered, especially in real-world settings. This paper contributes to the body of knowledge in the field by addressing the following problems for ambient sounds recorded in various real-world kitchen environments: 1) which features and which classifiers are most suitable in the
presence of background noise? 2) what is the effect of signal duration on recognition accuracy? 3) how do the signal-to-noise-ratio and the distance between the microphone and the audio source affect the recognition accuracy in an environment in which the system was not trained? We show that for systems that use traditional classifiers, it is beneficial to combine gammatone frequency cepstral coefficients and discrete wavelet transform coefficients and to use a gradient boosting classifier. For systems based on deep learning, we consider 1D and 2D Convolutional Neural Networks (CNN) using mel-spectrogram energies
and mel-spectrograms images, as inputs, respectively and show that the 2D CNN outperforms the 1D CNN. We obtained competitive classification results for two such systems. The first one, which uses a gradient boosting classifier,
achieved an F1-Score of 90.2% and a recognition accuracy of 91.7%. The second
one, which uses a 2D CNN with mel-spectrogram images, achieved an F1-Score
of 92.7% and a recognition accuracy of 96%
Audio-based Event Recognition System for Smart Homes
Building an acoustic-based event recognition system
for smart homes is a challenging task due to the lack of
high-level structures in environmental sounds. In particular,
the selection of effective features is still an open problem.
We make an important step toward this goal by showing that
the combination of Mel-Frequency Cepstral Coefficients, Zero-
Crossing Rate, and Discrete Wavelet Transform features can
achieve an F1 score of 96.5% and a recognition accuracy of
97.8% with a gradient boosting classifier for ambient sounds
recorded in a kitchen environment
Energy-based decision engine for household human activity recognition
We propose a framework for energy-based human
activity recognition in a household environment. We apply
machine learning techniques to infer the state of household
appliances from their energy consumption data and use rulebased
scenarios that exploit these states to detect human activity.
Our decision engine achieved a 99.1% accuracy for real-world
data collected in the kitchens of two smart homes
Bacchus Long‐Term (BLT) data set: Acquisition of the agricultural multimodal BLT data set with automated robot deployment
Achieving a robust long-term deployment with mobile robots in the agriculture domain is both a demanded and challenging task. The possibility to have autonomous platforms in the field performing repetitive tasks, such as monitoring or harvesting crops, collides with the difficulties posed by the always-changing appearance of the environment due to seasonality.
With this scope in mind, we report an ongoing effort in the long-term deployment of an autonomous mobile robot in a vineyard, with the main objective of acquiring what we called the Bacchus Long-Term (BLT) Dataset. This dataset consists of multiple sessions recorded in the same area of a vineyard but at different points in time, covering a total of 7 months to capture the whole canopy growth from March until September. The multimodal dataset recorded is acquired with the main focus put on pushing the development and evaluations of different mapping and localisation algorithms for long-term autonomous robots operation in the agricultural domain. Hence, besides the dataset, we also present an initial study in long-term localisation using four different sessions belonging to four different months with different plant stages. We identify that state-of-the-art localisation methods can only cope partially with the amount of change in the environment, making the proposed dataset suitable to establish a benchmark on which the robotics community can test its methods. On our side, we anticipate two solutions pointed at extracting stable temporal features for improving long-term 4d localisation results.
The BLT dataset is available at https://lncn.ac/lcas-blt}{lncn.ac/lcas-blt
Human – Computer interaction through affective interfaces
The development of future systems, capable to understand human affective states and respond to emotional changes is the key aim of Affective Computing. In this context, Automatic Affect Recognition (AAR) is a major challenge, yet at the same time, a significantly complex problem. Solutions to this problem can lead to advances in human-computer interaction, and may as well lead to improvement of the quality of life, through for e.g. automatic stress detection systems. The present thesis focuses on human computer interaction, by the means of Affectictive Interfaces. It introduces for the first time in the field of AAR, biosignal (Galvanic Skin Response - GSR and Electrocardiogram - ECG) processing techniques that are based on the theory of orthogonal Legendre and Krawtchouk moments. Moreover, novel variations of these moments are proposed, with the aim to further enhance the effectiveness of AAR systems. Furthermore, it introduces novel subject-dependent features that are extracted from GSR and ECG biosignals, which are capable to reduce between-subject variability that typically appears in these biosignals. Finally, it introduces new human behavioural features that can be extracted automatically from computer systems, with the aim of automatic stress detection. Focusing on future practical applications of AAR, the proposed methods were experimentally evaluated over their effectiveness towards the automatic recognition of boredom, frustration and stress. For this purpose, experiments were designed and conducted, with the aim to collect data during the induction of the specific affective states. The experimental evaluation showed that the proposed methods have the capability to significantly enhance the field of AAR, towards the development of future practical systems. Finally, focusing on affective Graphical User Interface (GUI) design, this thesis presents the development of a GUI for a mobile device application. Following affective design principles, the specific GUI was designed with the main aim of pleasing the end user and reducing the possibilities for negative emotions (e.g. frustration) to be induced from the interaction with it.Βασική επιδίωξη του χώρου της συναισθηματικής υπολογιστικής (Affective Computing) είναι η ανάπτυξη ολοκληρωμένων μηχανών, ικανών αφενός να αναγνωρίζουν παραμέτρους της συναισθηματικής κατάστασης του χρήστη τους, και αφετέρου, να αναπροσαρμόζουν το πεδίο της αλληλεπίδρασης κατάλληλα. Στο πλαίσιο αυτό, η αυτόματη αναγνώριση ανθρώπινων συναισθημάτων από υπολογιστή (ΑΑΣ) αποτελεί σημαντική πρόκληση ως πολύπλοκο πρόβλημα, η επίλυση του οποίου δύναται να συμβάλει αποφασιστικά, τόσο στη βελτίωση της αλληλεπίδρασης ανθρώπου-υπολογιστή, όσο και στη βελτίωση της ποιότητας ζωής του ανθρώπου γενικότερα. Η παρούσα διατριβή εξετάζει το θέμα της αλληλεπίδρασης ανθρώπου-υπολογιστή με χρήση ευφυών διεπαφών αναγνώρισης συναισθημάτων. Στα πλαίσιά της, εισάγονται για πρώτη φορά στο χώρο της ΑΑΣ, μέθοδοι επεξεργασίας βιοσημάτων Ηλεκτροκαρδιογραφήματος και Γαλβανικής Αντίδρασης Δέρματος, που βασίζονται στη θεωρία των ορθογώνιων ροπών Legendre και Krawtchouk. Προτείνονται επίσης νέες τροποποιήσεις των εν λόγω ροπών, οι οποίες είναι σε θέση να δώσουν χαρακτηριστικά βιοσημάτων, ικανά για περαιτέρω ενίσχυση της ΑΑΣ. Ακόμη, εισάγονται νέα, εξατομικευμένα χαρακτηριστικά των παραπάνω βιοσημάτων, ικανά να αμβλύνουν τις διαφοροποιήσεις που τυπικά παρουσιάζονται στα βιοσήματα διαφορετικών ανθρώπων. Επιπρόσθετα, προτείνονται νέα χαρακτηριστικά συμπεριφοράς του ανθρώπου, τα οποία μπορούν να εξαχθούν αυτόματα από υπολογιστικά συστήματα, με στόχο την αυτόματη αναγνώριση στρες. Εστιάζοντας σε μελλοντικές πρακτικές εφαρμογές ΑΑΣ, οι προτεινόμενες μέθοδοι εξετάζονται πειραματικά ως προς την απόδοσή τους στην αυτόματη αναγνώριση συναισθηματικών καταστάσεων ανίας, εκνευρισμού και ψυχολογικού στρες. Για το σκοπό αυτό, σχεδιάστηκαν και εκτελέστηκαν στα πλαίσια της παρούσας έρευνας πειράματα, που είχαν σαν στόχο την καταγραφή των απαραίτητων δεδομένων, κατά την επαγωγή των παραπάνω συναισθημάτων. Τα πειραματικά αποτελέσματα που προέκυψαν, καταδεικνύουν ότι οι προτεινόμενες μέθοδοι έχουν τη δυνατότητα να ενισχύσουν σημαντικά το χώρο της ΑΑΣ, στην κατεύθυνση της ανάπτυξης μελλοντικών συστημάτων, ικανών να βρουν θέση στην καθημερινότητα του ανθρώπου. Τέλος, επιχειρώντας ένα ακόμη βήμα προς στην κατεύθυνση μελλοντικών, ολοκληρωμένων εφαρμογών συναισθηματικής υπολογιστικής, επιχειρείται στα πλαίσια της διατριβής η εφαρμογή αρχών του χώρου στη διαδικασία ανάπτυξης διεπαφών αλληλεπίδρασης ανθρώπου-υπολογιστή, με στόχο την ευχαρίστηση των χρηστών και την αποφυγή πρόκλησης αρνητικών συναισθημάτων κατά την αλληλεπίδραση